Abstract
Genome-wide association studies (GWAS) have revealed heritable genetic risk factors for childhood acute lymphoblastic leukemia (ALL), highlighting the role of lymphoid development and cell cycle control genes in disease etiology. Most studies have focused on non-Latino white populations; however, the highest global rates of ALL are found in Latinos in the US and Latin America. To identify novel genetic susceptibility alleles, we carried out a large, multi-ethnic GWAS of ALL in the Californian population.
Childhood ALL cases were identified using the California Cancer Registry, and age, gender, and ethnicity-matched controls were identified from the Department of Vital Statistics. Newborn bloodspots from California-born children were obtained from the California Biobank Program. Genome-wide SNP genotyping was carried out for 1,949 Latino, 1,184 non-Latino white, and 130 African-American cases as well as 3,506 controls, using the Affymetrix Axiom World Latin array. Additional controls (n=12,471) from the Kaiser Resource for Genetic Epidemiology Research on Aging Cohort (GERA) were genotyped on the same array. For increased statistical power, we carried out a meta-analysis of the three ethnicity-stratified GWAS. Replication of putative novel loci was performed in two independent datasets: the European-ancestry based Children's Oncology Group (COG) (959 cases, plus 2624 controls from the Wellcome Trust Case-Control Consortium), and in Latino subjects from the California Childhood Leukemia Study (CCLS) (530 cases, 511 controls). Local imputation across top association loci was carried out, using 1000 Genomes Project SNP data as a reference, to identify causal variants.
Our childhood ALL GWAS meta-analysis of 3,263 cases and 15,977 controls identified two novel genome-wide significant SNP loci (P<5.0 x 10-8) at chromosomes 8q24 (OR=1.27, 95% CI:1.17-1.38) and 17q12 (OR=1.18, 95% CI:1.11-1.25) that replicated in COG and CCLS datasets. The 8q24 locus lies within a 2.5Mb region containing known GWAS hits for multiple phenotypes, including several cancer types and developmental traits. Imputation narrowed the locus to a ~100kb intergenic region, including top hit SNP rs4617118, that demonstrates chromatin looping with the MYC oncogene in lymphoblastoid cell lines. The 17q12 association peak encompasses the lymphoid transcription factor IKZF3, as well as ZPBP2, GSDMB, and ORMDL3. Top SNP rs2290400 disrupts a BLIMP-1 binding motif, and is a highly significant expression quantitative trait locus (eQTL) for both GSDMB and ORMDL3 . Rs2290400 is also a GWAS hit for type 1 diabetes and asthma, with the ALL risk allele conferring protection against these autoimmune conditions. In addition to novel loci, we replicated established GWAS hits in ARID5B, IKZF1, CEBPE, PIP4K2A, and CDKN2A at genome-wide significance, and in GATA3 at P=5.2x10-6. Recently identified loci at LHPP and ELK3 also replicated, at P=5.7x10-6 and P=2.1x10-3 respectively.
Capitalizing on the large proportion of Latinos in our study, we utilized the reduced linkage disequilibrium of this genetically admixed population in combination with fine-mapping to identify new putative causal variants at known ALL-associated loci.
Our large, multi-ethnic GWAS of childhood ALL has identified two novel genetic associations, at loci that function in hematopoiesis (17q12) and B-cell proliferation (MYC), in addition to pinpointing causal variants at known ALL loci. Further work is required to elucidate the biological mechanisms underlying these associations. Our results suggest that analysis of larger ALL datasets may yield additional genetic risk loci of moderate effect size.
Ma: Incyte Corp.: Consultancy.
Author notes
Asterisk with author names denotes non-ASH members.